Chinese Novelty Mining

نویسندگان

  • Yi Zhang
  • Flora S. Tsai
چکیده

Automated mining of novel documents or sentences from chronologically ordered documents or sentences is an open challenge in text mining. In this paper, we describe the preprocessing techniques for detecting novel Chinese text and discuss the influence of different Part of Speech (POS) filtering rules on the detection performance. Experimental results on APWSJ and TREC 2004 Novelty Track data show that the Chinese novelty mining performance is quite different when choosing two dissimilar POS filtering rules. Thus, the selection of words to represent Chinese text is of vital importance to the success of the Chinese novelty mining. Moreover, we compare the Chinese novelty mining performance with that of English and investigate the impact of preprocessing steps on detecting novel Chinese text, which will be very helpful for developing a Chinese novelty mining system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptable Services for Novelty Mining

Novelty mining is the process of mining relevant information on a given topic. However, designing adaptable services for real-world novelty mining faces several challenges like real-time processing of incoming documents, computational efficiency, multi-user working environment, diverse system requirements, and integration of domain knowledge from different users. In this paper, the authors brid...

متن کامل

Mobile Novelty Mining

Service-oriented Web applications allow users to exploit applications over networks and access them from a remote system at the client side, including mobile phones. Individual services are built separately with comprehensive functionalities. In this article, the authors transform a standalone offline novelty mining application into a service-oriented application and allow users to access it ov...

متن کامل

Blended metrics for novel sentence mining

With the abundance of raw text documents available on the internet, many articles contain redundant information. Novel sentence mining can discover novel, yet relevant, sentences given a specific topic defined by a user. In real-time novelty mining, an important issue is to how to select a suitable novelty metric that quantitatively measures the novelty of a particular sentence. To utilize the ...

متن کامل

Novelty detection: a review - part 1: statistical approaches

Novelty detection is the identification of new or unknown data or signal that a machine learning system is not aware of during training. Novelty detection is one of the fundamental requirements of a good classification or identification system since sometimes the test data contains information about objects that were not known at the time of training the model. In this paper we provide stateof-...

متن کامل

Rejecting the arguments of the sanctity of bitcoin mining and proving its legitimacy by Reward Contract (Joaleh)

Bitcoin soon attracted the attention of experts and the general public around the world, including the Islamic community. Due to the novelty of the subject, although little research has been done to examine the legitimacy of bitcoin mining from the perspective of Muslim thinkers, this paper is responsible for examining two reasons in the research of contemporary Sunni thinkers. The two reasons ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009